巴西专利BR102014006298A2 INSTRUCTIONS FOR MASKING THE START AND END OF THE NON-TRANSACTION CODE REGION REQUIRING WRITEBACK FO

专利PDF首页>>巴西专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
Instructions for Masking the Beginning and End of the Non-Transactional Code Region Requiring Writeback for Persistent Storage The present invention describes a processor having an interface for nonvolatile random access memory and logic circuits. Logic circuits are used to identify cache lines modified by a transaction that see volatile random access memory as the transaction persistence store. Logic circuits also serve to identify cache lines modified by a software process except for a transaction that also sees nonvolatile random access memory as persistence storage.
公开号:BR102014006298A2
申请号:R102014006298-0
申请日:2014-03-17
公开日:2018-02-14
发明作者:Willhalm Thomas
申请人:Intel Corporation；
IPC主号:

专利说明:

(54) Title: INSTRUCTIONS FOR MASKING THE BEGINNING AND END OF THE NON-TRANSACTIONAL CODE REGION THAT REQUIRES PERSISTENT STORAGE WRITEBACK (51) Int. Cl .: G06F 9/46; G06F 12/0811; G06F 12/08 (52) CPC: G06F 9/46, G06F 12/0811, G06F 12/08 (30) Unionist Priority: 3/15/2013 US 13 / 843,760 (73) Holder (s): INTEL CORPORATION (72 ) Inventor (s): THOMAS WILLHALM (74) Attorney (s): PINHEIRO NETO ADVOGADOS (57) Summary: INSTRUCTIONS TO MASK THE BEGINNING AND END OF THE NON-TRANSACTIONAL CODE REGION THAT REQUIRES PERSISTENT STORAGE WRITEBACK The present invention describes a processor having an interface for non-volatile random access memory and logic circuits. The logic circuits serve to identify cache lines modified by a transaction that sees volatile random access memory as the persistence store for the transaction. The logic circuits also serve to identify cache lines modified by a software process except for a transaction that also sees non-volatile random access memory as persistence storage.
1/48
INSTRUCTIONS FOR MASKING THE BEGINNING AND END OF THE NON-TRANSACTIONAL CODE REGION THAT REQUIRES PERSISTENT STORAGE WRITEBACK
FIELD OF THE INVENTION [0001] This invention refers generally to the field of computer systems. More specifically, the invention relates to an equipment and method for implementing a multilevel memory hierarchy including a level of non-volatile memory.
DESCRIPTION OF RELATED TECHNIQUE
A. Current Storage and Memory Configurations [0002] One of the limiting factors for computer innovation today is storage and memory technology. In conventional computer systems, system memory (also known as main memory, primary memory, executable memory) is typically implemented by dynamic random access memory (DRAM). DRAM-based memory consumes power even when there are no readings or writes to the memory because it must constantly recharge the internal capacitors. DRAM-based memory is volatile, which means that data in DRAM memory is lost when power is removed. Conventional computer systems also rely on multiple levels of caching to improve performance. A cache is a high-speed memory positioned between the processor and the system memory to serve requests for access to the Internet.
2/48 memory faster than could be served from system memory. Such caches are typically implemented with static random access memory (SRAM). Cache management protocols can be used to ensure that the most frequently accessed data and instructions are stored within one of the cache levels, thereby reducing the number of memory access operations and improving performance.
[0003] With respect to mass storage (also known as secondary storage or disk storage), conventional mass storage devices typically include magnetic media (for example, hard drives), optical media (for example, drive compact disc (CD), versatile digital disc (DVD), etc.), holographic media, and / or mass storage flash memory (for example, solid state drives (SSDs), removable flash drives, etc.). These storage devices are generally considered to be Input / Output (I / O) devices because they are accessed by the processor through multiple I / O adapters that implement various I / O protocols. These I / O adapters and I / O protocols consume a significant amount of energy and can have a significant impact on the insert area and the form factor of the platform. Portable or mobile devices (for example, laptops, netbooks, tablet computers, personal digital assistants (PDAs), portable media players, portable
3/48 games, digital cameras, mobile phones, smartphones, special-function phones, etc.) that have limited battery life when not connected to a permanent power source may include removable mass storage devices (eg, Card Integrated Multimedia (eMMC), Secure Digital Card (SD)) that are typically coupled to the processor through low-power interconnections and 1/0 controllers to meet active and inactive energy budgets.
[0004] With respect to firmware memory (such as boot memory (also known as flash BIOS)), a conventional computer system typically uses flash memory devices to store persistent system information that is read frequently, but rarely (or never) recorded. For example, initial instructions performed by a processor to initialize essential system components during a boot process (Basic Input and Output System (BIOS) images) are typically stored on a memory device flash memory devices that are available in limited (speed is additionally reduced by the extra code for read protocols (eg 2.5 MHz). To speed up the execution speed conventional processors generally cache a portion of the BIOS code during the Pre-Firmware Interface phase (PEI) of the flash process (currently speeds are generally 50 MHz). This example BIOS market stores data in
4/48 boot. The size of the processor cache imposes a restriction on the size of the BIOS code used in the PEI phase (also known as PEI BIOS code).
B. Phase Change Memory (PCM) and Technologies
Related [0005] Phase change memory (PCM), sometimes also referred to as phase change random access memory (PRAM or PCRAM), PCME, Unified Ovonic Memory, or calcogenide RAM (C-RAM), is a type of non-volatile computer memory that exploits the unique behavior of chalcogen glass. As a result of the heat produced by the passage of an electric current, chalcogenide glass can be switched between two states, crystalline and amorphous. Recent versions of PCM can achieve two distinct additional states.
[0006] PCM provides superior performance than the
f lash because the element in memory in PCM can be changed more quickly, The recording (bit change individual for 1 or 0) can to be done without the
need to first erase an entire cell block, and degradation from recordings is slower (a PCM device can survive approximately 100 million recording cycles; PCM degradation is due to thermal expansion during programming, metal migration (and other material), and other mechanisms).
BRIEF DESCRIPTION OF THE DRAWINGS
5/48
[0007] The following description and the attached drawings are used to illustrate embodiments of the invention. We drawings: [0008] Figure 1 illustrates a memory arrangement in system and cache according to a modality gives invention; [0009] Figure 2 illustrates a hierarchy in
storage and memory employed in an embodiment of the invention;
[0010] Figure 3 illustrates a computer system in which the modalities of the invention can be implemented; [0011] Figure 4 illustrates a transaction process; [0012] Figure 5 illustrates a process having special hardware to monitor the changes made in the cache; [0013] Figure 6 shows a process that uses the special hardware in Figure 5 to record non-transactional data changes for persistence storage;
[0014] Figure 7 shows an integrated process that illustrates that the special hardware in Figure 5 can be used to support transactional reversals and non-transactional writes for persistence;
[0015] Figure 8 shows a compilation process.
DETAILED DESCRIPTION [0016] In the following description, several specific details such as logical implementations, operation codes, media for specifying operands, implementations of partition / sharing / duplication of resources, types of interrelationships of system components, and partition options / logical integration are
6/48 presented to provide a more complete understanding of the present invention. It will be considered, however, by those skilled in the art that the invention can be practiced without such specific details. In other cases, complete control structures, port level circuits and software instruction sequences have not been shown in detail in order not to obscure the invention. Those of ordinary skill in the art, with the included descriptions, will be able to implement the appropriate functionality without undue experimentation. [0017] References in the Descriptive Report to a specific modality, a modality, an exemplary modality, etc., indicate that the described modality may include a specific aspect, structure or characteristic, but each modality may not necessarily include the specific aspect, structure or characteristic . In addition, such phrases do not necessarily refer to the same modality. Additionally, when a specific aspect, structure or characteristic is described in connection with a modality, it is considered that it is within the knowledge of those skilled in the art to carry out that aspect, structure or characteristic in connection with other modalities that are or are not explicitly described. .
[0018] In the following description and claims, the terms coupled and connected together with their derivatives, can be used. It should be understood that these terms are not intended to be synonymous with each other. Coupled is used to indicate that two or more
7/48 elements, which may or may not be in direct physical or electrical contact with each other, cooperate or interact with each other. Connected is used to indicate the establishment of communication between two or more elements that are coupled together.
[0019] Texts in parentheses and blocks with dashed edges (for example, large dashes, small dashes, dashed points, dots) are sometimes used here to illustrate optional operations / components that add optional aspects to the modalities of the invention. However, such notation should not be taken to mean that these are the only optional options or operations / components, and / or that blocks with full edges are not optional in certain embodiments.
INTRODUCTION [0020] Performance and memory capacity requirements continue to increase with an increasing number of processor cores and new usage models such as virtualization. In addition, energy and memory cost have become significant components of energy and total cost, respectively, for electronic systems.
[0021] Some modalities solve the above challenges by intelligently subdividing the performance requirement and the capacity requirement among memory technologies. The focus of this approach is to improve performance with relatively small amount of relatively higher speed memory such as DRAM while implementing memory volume
8/48 system using significantly more dense non-volatile random access memory (NVRAM). Modalities of the invention described below define platform configurations that enable hierarchical memory subsystem organizations for the use of NVRAM. The use of NVRAM in the memory hierarchy also allows for new uses such as expanded boot space and mass storage implementations.
[0022] Figure 1 illustrates an arrangement of system memory and cache according to the modalities of the invention. Specifically, Figure 1 shows a memory hierarchy including an internal processor cache set 120, nearby memory acting as a distant memory cache 121 that can include both internal cache (s) 106 and external caches 107-109, and memory 122. A specific type of memory that can be used for distant memory in some embodiments of the invention is non-volatile random access memory (NVRAM). As such, an overview of NVRAM is provided below, followed by an overview of distant memory and near memory.
A. Non-volatile Random Access Memory (NVRAM) [0023] There are many possible technology options for NVRAM, including PCM, phase shift and switch memory (PCMS) (the last mentioned being a more specific implementation of the first), persistent byte addressable memory (BPRAM), storage class memory (SCM), universal memory, Ge2Sb2Te5, programmable metallization cell (PCM),
9/48 resistive memory (RRAM), RESET cell (amorphous), SET cell (crystalline), PCME, Ovshinsky memory, ferroelectric memory (also known as polymer and poly memory (N-vinylcarbazole)), ferromagnetic memory (also known as Spintronics, SPRAM (spin-torque torque RAM), STRAM (spin-tunneling RAM), magnetoresistive memory, magnetic memory, magnetic random access memory (MRAM)), and semiconductoroxide-nitride-oxide-semiconductor (SONOS, also known as dielectric memory).
[0024] NVRAM has the following characteristics:
(а) It keeps its contents even if the power is removed, similar to the FLASH memory that is used in solid state drives (SSD), and different from SRAM and DRAM that are volatile;
(2) Lower energy consumption than volatile memories such as SRAM and DRAM;
(3) Random access similar to SRAM and DRAM (also known as randomly addressable);
(4) Rewritable and erasable at a lower granularity level (for example, byte level) than FLASH found on SSDs (which can only be rewritten and erased one block at a time - minimally 64 Kbytes in size for NOR FLASH and 16 Kbyte for NAND FLASH;
(5) Used as a system memory and allocated all or a portion of the system memory address space;
(б) Able to be coupled to the processor via a bus using a protocol that supports identifiers (IDs) to support off-line operation
10/48 order and allowing access at a granularity level small enough to support NVRAM operation as system memory (for example, cache line size such as 64 or 12 8 bytes). For example, the bus may be an out-of-order memory bus (for example, a DDR bus such as DDR3, DDR4, etc.). As another example, the bus can be a PCI express (PCIE) bus, desktop management interface (DMI) bus, or any other type of bus using an out-of-order protocol and a sufficiently small payload size ( for example, cache line size, such as 64 or 128 bytes); and (7) One or more of the following:
a) Higher recording speed than non-volatile storage / memory technologies such as FLASH;
b) Very high reading speed (higher than FLASH and close to or equivalent to DRAM reading speeds);
c) Writable directly (rather than requiring erasure (over recording with ls) before recording data such as the FLASH memory that is used in SSDs);
d) A larger number of writes before failure (more than boot ROM and FLASH used on SSDs).
[0025] As mentioned above, in comparison with FLASH memory, which must be rewritten and deleted a whole block at a time, the level of granularity in the
11/48 which NVRAM is accessed in a given implementation may depend on the specific memory controller and the specific memory bus or other type of bus to which the NVRAM is attached. For example, in some implementations where NVRAM is used as system memory, NVRAM can be accessed at the granularity of a cache line (for example, a 64 byte or 128 byte cache line), despite an inherent ability to be accessed at the granularity of a byte, because the cache line is the level at which the memory subsystem accesses memory. Thus, when NVRAM is installed within a memory subsystem, it can be accessed at the same level of granularity as the DRAM (for example, nearby memory) used in the same memory subsystem. Still, the level of granularity for accessing NVRAM through the memory controller and memory bus or other type of bus is less than that of the block size used by Flash and the subsystem controller and bus access size of 1 / 0.
[0026] NVRAM can also incorporate wear leveling algorithms to consider the fact that storage cells at the distant memory level begin to wear out after a number of write accesses, especially when a significant number of recordings can occur as in a system memory implementation. Since high cycle counting blocks are more likely to wear out in this way, wear leveling distributes recordings across memory cells
12/48 distant by changing addresses of high cycle count blocks with low cycle count blocks. Note that most of the address change is typically transparent to application programs because it is handled by hardware, low-level software (for example, a low-level driver or operating system), or a combination of the two.
B. Far memory [0027] Far memory 122 of some embodiments of the invention is implemented with NVRAM, but is not necessarily limited to any specific memory technology. Distant memory 122 can be distinguished from other data and instruction storage / memory technologies in terms of their characteristics and / or their application in the storage / memory hierarchy. For example, distant memory 122 is different from:
1) Static random access memory (SRAM) that can be used for level 0 and level 1 internal processor caches 10la-b, 102a-b, 103a-b, 103a-b, and 104a-b dedicated to each processor cores 101-104, respectively, and low-level cache (LLC) 105 shared by processor cores;
2) Dynamic random access memory (DRAM) configured as a cache 106 internal to processor 100 (for example, on the same chip as processor 100) and / or configured as one or more caches 107-109 external to processor (for example, in the same package or in a different package than processor 100); and
13/48
3) FLASH memory / magnetic disk / optical disk applied as mass storage media (not shown); and
4) Memory such as FLASH memory or other read memory (ROM) used as firmware memory (which can refer to the boot ROM, BIOS Flash, and / or TPM Flash). (not shown).
[0028] Distant memory 122 can be used as instruction and data storage media that can be addressed directly by a processor 100 and is capable of sufficiently tracking processor 100 compared to FLASH / magnetic disk / optical disk used as storage medium. mass storage. In addition, as discussed above and described in detail below, remote memory 122 can be placed on a memory bus and can communicate directly with the memory controller, which in turn communicates directly with processor 100.
[0029] Distant memory 122 can be combined with other instruction and data storage technologies (eg DRAM) to form hybrid memories (also known as colocalization PCM and DRAM; first level memory and second level memory; FLAM (FLASH and DRAM). Note that at least some of the above technologies, including PCM / PCMS, can be used for mass storage instead of, or in addition to, system memory, and need not be randomly accessible, accessible by byte, or accessible directly by the processor when employed in this manner. .
14/48 [0030] For convenience of explanation, most of the rest of the order will refer to NVRAM or, more specifically, PCM, or PCMS as the selection of technology for distant memory 122. As such, the terms: NVRAM, PCM, PCMS, and distant memory; can be used interchangeably in the following discussion. However, it should be noted, as discussed above, that different technologies can also be used for distant memory. In addition, NVRAM is not limited to use as distant memory.
C. Near memory [0031] Near memory 121 is an intermediate level of memory configured in front of distant memory 122 that has lower read / write access latency compared to distant memory and / or access latency. more symmetrical read / write (ie, having read times that are roughly equivalent to write times). In some embodiments, near memory 121 has write latency significantly lower than distant memory 122, but similar (for example, slightly lower or equal) reading latency; for example, nearby memory 121 may be volatile memory such as volatile random access memory (VRAM) and may comprise a DRAM or other high speed capacitor based memory. Note, however, that the underlying principles of the invention are not limited to these specific types of memory. In addition, near memory 121 may have a relatively high density
15/48 lower and / or may be more expensive to manufacture than distant memory 122.
[0032] In one embodiment, near memory 121 is configured between distant memory 122 and internal processor caches 120. In some of the modalities described below, near memory 121 is configured as one or more caches on the memory side (MSCs ) 107109 to mask the usage and / or performance limitations of distant memory including, for example, read / write latency limitations and memory degradation limitations. In these implementations, the combination of MSC 107-109 and distant memory 122 operates at a performance level that is close to, equivalent to or exceeds a system that uses only DRAM as system memory. As discussed in detail below, although shown as a cache in Figure 1, nearby memory 121 may include modes in which it performs other functions, either in addition to, or instead of performing the function of, a cache.
[0033] Nearby memory 121 may be located in the processor chip (as cache 106) and / or located external to the processor chip (as caches 107-109) (for example, in a separate chip located in the CPU, located outside the CPU package with a high bandwidth link to the CPU package, for example, in a dual inline memory module (DIMM), a riser / mezzanine, or a motherboard from computer). Nearby memory 121 can be coupled in communication with processor 100 using a single link or multiple bandwidth links
16/48 high, such as DDR or other high bandwidth links (as described in detail below).
AN EXEMPLAR SYSTEM MEMORY ALLOCATION SCHEME [0034] Figure 1 illustrates how the various levels of caches 101-109 are configured with respect to a physical system address space (SPA) 116-119 in embodiments of the invention. As mentioned, this modality comprises a processor 100 having one or more cores 101-104, with each core having its own dedicated top level cache (L0) 101a-104a and cache (Ll) 101b-104b of middle level cache (MLC ). The processor also includes a shared LLC 105. The operation of these various cache levels is well understood and will not be described in detail here.
[0035] The caches 107-109 illustrated in Figure 1 can be dedicated to a specific system memory address range or a set of non-contiguous address ranges. For example, cache 104 is dedicated to actuation with an MSC for system memory address range N ² 1 116 and cases 108 and 109 are dedicated to act as MSCs for non-overlapping portions of system memory address ranges N ² 2 117 and N ² 3 118. The implementation mentioned last can be used for systems in which the SPA space used by processor 100 is interspersed in an address space used by caches 107-109 (for example, when configured as MSCs) . In some embodiments, that address space mentioned last is referred to as a memory channel address space (MCA). In one embodiment, internal caches 101a-106 perform
17/48 caching operations for the entire SPA space.
[0036] The system memory as used here is memory that is visible to and / or directly addressable by the software running on processor 100; while cache memories 101a-109 can operate transparently to the software in the sense that they do not form a directly addressable portion of the system address space, but the cores can also support the execution of instructions to allow software to provide some control (configuration, policies, suggestions, etc.) for some or all caches. The subdivision of system memory into regions 116-119 can be performed manually as part of a system configuration process (for example, through a system designer) and / or can be performed automatically using software.
[0037] In one embodiment, the system memory regions 116-119 are implemented using distant memory (for example, PCM) and, in some embodiments, the nearby memory configured as system memory. The N ² 4 system memory address range represents an address range that is implemented using a higher speed memory such as DRAM which can be a nearby memory configured in a system memory mode (as opposed to a storage mode cached) [0038] Figure 2 memory / a rma z enamen illustrates a hierarchy of 140 and different configurable operating modes for nearby memory 144 and
18/48
NVRAM according to the modalities of the invention. Memory / storage hierarchy 140 has multiple levels including (1) a cache level 150 that can include processor caches 150A (for example, caches 101A-105 in Figure 1) and optionally close memory as cache for distant memory 150B (in certain modes of operation as described here), (2) a level of system memory 151 that can include distant memory 151B (eg, NVRAM such as PCM) when near memory is present (or only NVRAM as system memory 174 when memory nearby memory is not present), and optionally nearby memory operating as system memory 151A (in certain modes of operation as described here), (3) a mass storage level 152 that may include flash / magnetic / mass storage media 152B optical and / or NVRAM 152A mass storage media (for example, a portion of NVRAM 142); and (4) a firmware memory level 153 which may include flash BIOS 170 and / or BIOS NVRAM 172 and optionally NVRAM of Trusted Platform Module (TPM) 173.
[0039] As indicated, near memory 144 can be implemented to operate in a variety of different modes including: a first mode in which it operates as a cache for distant memory (near memory as cache for FM 150B); a second mode in which it operates as system memory 151A and occupies a portion of the STA space (sometimes referred to as near-memory direct access mode); and one or more additional modes of operation such as a memory
19/48 work 192 or as a recording buffer 193. In some embodiments of the invention, the nearby memory can be divided, where each partition can operate simultaneously in a different mode among the supported modes; and different modes can support configuration of partitions (for example, sizes, modes) through hardware (for example, fuses, pins), firmware and / or software (for example, through a set of programmable range registers within the controller MSC 124 inside which, for example, different binary codes can be stored to identify each mode and partition).
[0040] The system address space A 190 in Figure 2 is used to illustrate the operation when the nearby memory is configured as an MSC for 150B remote memory. In this configuration, the system address space A 190 represents the entire system address space (and the system address space B 191 does not exist). Alternatively, system address space B 191 is used to show an implementation when all or a portion of nearby memory is allocated a portion of the system address space. In this embodiment, the system address space B 191 represents the range of the system address space assigned to the nearby memory 151A and system address space A 190 represents the range of the system address space assigned to NVRAM 174.
[0041] In addition, by acting as a cache for distant memory 150B, nearby memory 144 can operate in various submodes under the control of the MSC controller
20/48
124. In each of these modes, the near memory address space (NMA) is transparent to the software in the sense that the nearby memory does not form a directly addressable portion of the system address space. These modes include, but are not limited to, the following:
(1) Write-back Caching Mode: In this mode, all or portions of the nearby memory acting as an FM 15 0B cache is used as a cache for the distant NVRAM (FM) 151B memory. While in write-back mode, each write operation is initially directed to the nearby memory as a cache for FM 150B (assuming that the cache line to which the recording is directed is present in the cache). A corresponding write operation is performed to update the NVRAM FM 151B only when the cache line within the nearby memory as a cache for FM 15 0B must be replaced with another cache line (compared to the write-through mode described below in which each recording operation is immediately propagated to NVRAM FM 151B).
(2) Next Memory Bypass Mode: In this mode, all readings and writes ignore the NM acting as an FM 150B cache and go directly to the NVRAM FM 151B. Such a mode can be used, for example, when an application is not cache-friendly or requires data to be submitted for persistence in the granularity of a cache line. In one embodiment, the caching performed by the 150A processor caches and the NM acting as an FM 150B cache
21/48 operate independently of each other. Consequently, data can be cached in NM acting as an FM 15OB cache which is not cached in 150A processor caches (and which, in some cases, may not be allowed to cache in processor caches. 15OA) and vice versa. Thus, certain data that can be designated as non-cacheable in processor caches can be cached within the NM acting as an FM 150B cache.
(3) Close Memory Read Cache Write Bypass Mode: This is a variation of the above mode where caching of reading persistent data from NVRAM FM 151B is allowed (that is, persistent data is stored in cache in nearby memory as cache for distant memory 150B for read operations). This is useful when most of the persistent data is readable and application usage is cache-friendly.
(4) Distant Memory Read Cache Write-Through Mode: This is a variation of the nearby memory read cache write bypass mode, where in addition to read cache, write-hits are also cached . Each recording of nearby memory as a cache for FM 150B causes a recording for FM 151B. Thus, due to the writethrough nature of the cache, cache line persistence is still guaranteed.
[0042] When operating in the direct memory access mode, all portions of the nearby memory as
22/48 system memory 151A are directly visible to the software and form part of the SPA space. Such memory can be completely under software control. Such a scheme can create a non-uniform memory address (NUMA) memory domain for software where it achieves superior performance from nearby memory 144 over NVRAM 174 system memory. As an example, and not as a limitation, such use may be used for certain graphics and high performance computing (HPC) applications that require very fast access to certain data structures.
[0043] In an alternative mode, the right near memory access mode is implemented by pinning certain lines of cache in the nearby memory (that is, lines of cache that have data that are also stored simultaneously in NVRAM 142) . Such fixation can be done effectively in larger, multi-way associative set caches. [0044] Figure 2 also illustrates that a portion of NVRAM 142 can be used as firmware memory. For example, the BIOS NVRAM 172 portion can be used to store BIOS images (instead of, or in addition to storing BIOS information in flash BIOS 170). The BIOS NVRAM 172 portion can be a portion of the SPA space and is directly addressable by software running on processor cores 101-104, while flash BIOS 170 is addressable through the I / O subsystem 115. As another example, the NVRAM 173 portion of the Trust Platform Module (TPM) can be used to
23/48 protect sensitive system information (eg encryption keys).
[0045] Thus, as indicated, NVRAM 142 can be implemented to operate in a variety of different modes, including 151B remote memory (for example, when nearby memory 144 is present / operating, if nearby memory is acting as a cache for FM through an MSC 124 control or not (accessed directly after the 101A-105 cache (s) and without MSC 124 control)); only NVRAM 174 system memory (not as distant memory because there is no nearby memory present / operating; and accessed without MSC 124 control); NVRAM 152A mass storage media; BIOS NVRAM 172; and TPM NVRAM 173. Although different modalities can specify NVRAM modes in different ways, Figure 3 describes the use of a 333 decoding table.
[0046] Figure 3 illustrates an exemplary computer system 300 in which the modalities of the invention can be implemented. Computer system 300 includes a processor 310 and memory / storage subsystem 3 80 with an NVRAM 142 used for system memory, mass storage and optionally firmware memory. In one embodiment, NVRM 142 comprises the integral system memory and storage hierarchy used by computer system 300 to store data, instructions, states and other persistent and non-persistent information. As previously discussed, NVRAM 142 can be configured to implement the functions in a typical memory and
24/48 hierarchy of system memory storage, mass storage and firmware memory, TPM memory, and the like. In the modality of Figure 3, NVRAM 142 is divided into FM 151B, NVRAM 152A mass storage media, NVRAM 173 BIOS, and NVRAM 173 TPM. Storage hierarchies with different functions are also considered and the application of NVRAM 142 is not limited to functions described above.
[0047] As an example, the operation while the next memory as cache for FM 150B is in write-back caching is described. In one embodiment, while the next memory as a cache for FM 150B is in the write-back caching mode mentioned above, a read operation will arrive first at the MSC 124 controller which will perform a query to determine whether the requested data is present in memory next acting as a cache for FM 150B (for example, using a 342 identifier cache). If present, it will return data to the requesting CPU, core 101-104, or I / O device via I / O subsystem 115. If data is not present, the MSC 124 controller will send the request together with the system memory for an NVRAM 332 controller. The NVRAM 332 controller will use decoding table 333 to convert the system memory address to an NVRAM physical device address (PDA) and direct the read operation to that remote memory region 151B . In one embodiment, the decoding table 333 includes an address masking table (AIT) component or
25/48 which the NVRAM 332 controller uses to convert between system memory addresses and NVRAM PDAs. In one embodiment, AIT is updated as part of the wear leveling algorithm implemented to distribute memory access operations and thereby reduce wear on NVRAM FM 151B. Alternatively, the AIT can be a separate table stored within the NVRAM 332 controller.
[0048] Upon receiving the requested data from the NVRAM FM 151B, the NVRAM 332 controller will return the requested data to the MSC 124 controller which will store the data in the nearby MSC memory acting as an FM 150B cache and also sends the data to the core requesting processor 101-104, or I / O device via I / O subsystem 115. Subsequent requests for this data will be answered directly from nearby memory acting as an FM 150B cache until they are replaced with some other NVRAM data FM.
[0049] As mentioned, recording operation first for the MSC 124 controller which writes it to the next MSC memory acting as an FM 150B cache. In write-back caching mode, data may not be sent directly to NVRAM FM 151B while a write operation is received. For example, data can be sent to NVRAM FM 151B only when the location in memory next to MSC acting as an FM 150B cache in which data is stored must be reused to store data in a mode, a memory also follows
26/48 to a different system memory address. When this happens, the MSC 124 controller realizes that the data is not current on the NVRAM FM 151B and thus will retrieve it from the nearby memory acting as an FM 150B cache and send it to the NVRAM 332 controller. The NVRAM 332 controller queries the PDA to the system memory address and then writes the data to the NVRAM FM 151B.
[0050] In Figure 3, the NVRAM 332 controller is shown connected to FM 151B, NVRAM 152 mass storage media, and NVRAM 172 BIOS using three separate lines. This does not necessarily mean, however, that there are three separate physical buses or communication channels connecting the NVRAM 332 controller with these portions of NVRAM 142. More properly, in some embodiments, a common memory bus or other type of bus is used to couple communicatively the NVRAM 332 controller to the FM 151B, NVRAM 152A mass storage media, and NVRAM 172 BIOS. For example, in one embodiment, the three lines in Figure 3 represent a bus, such as a memory bus (for example , a DDR3, DDR4 bus, etc.), through which the NVRAM 332 controller implements a protocol (for example, out of order) for communication with NVRAM 142. The NVRAM 332 controller can also communicate with NVRAM 142 via a bus supporting a native protocol such as a PCI express bus, desktop management interface (DMI) bus, or any other type of bus using a protocol it out of order and one
27/48 payload size small enough (for example, cache line size such as 64 or 12 8 bytes).
[0051] In one embodiment, computer system 300 includes an integrated memory controller (IMC) 331 that performs access control to central memory for processor 310, which is coupled to: 1) a cache controller from the memory side (PSC) 124 to control access to nearby memory (NM) acting as a 150B distant memory cache; and 2) an NVRAM 332 controller to control access to NVRAM 142. Although illustrated as separate units in Figure 3, the MSC 124 controller and the NVRAM 332 controller can logically form part of the IMC 331.
[0052] In the illustrated mode, the MSC 124 controller includes a set of track registers 336 that specify the operating mode in use for the NM acting as a distant memory cache 15 0B (for example, write-cache mode back, near memory bypass mode, etc., described above). In the illustrated mode, DRAM 144 is used as the memory technology for the NM acting as a cache for the distant memory 150B. In response to a memory access request, the MSC 124 controller can determine (depending on the operating mode specified in the 336 range registers) whether the request can be fulfilled from the NM acting as a cache for FM 150B or whether the request should be sent to the NVRAM 332 controller, which can then fulfill the request at
28/48 from the remote memory (FM) 151B portion of NVRAM 142.
[0053] In a mode where NVRAM 142 is implemented with PCMs, the NVRAM 332 controller is a PCMS controller that performs access with protocols consistent with PCMS technology. As discussed earlier, PCMS memory is inherently capable of being accessed at the granularity of a byte. However, the NVRAM 332 controller can access distant memory based on PCMS 151B at a lower level of granularity such as a cache line (for example, a 64-bit or 12-bit cache line) or any other level of granularity consistent with the memory subsystem. The underlying principles of the invention are not limited to any specific level of granularity for accessing distant memory based on PCMS 151B. In general, however, when PCMS 151B-based distant memory is used to form part of the system address space, the level of granularity will be higher than that traditionally used for other non-volatile storage technologies such as FLASH, which can only perform operations rewrite and erase at the level of a block (minimum size of 64 Kbytes for NOR FLASH and 16 Kbyte for NAND FLASH).
[0054] In the illustrated mode, the NVRAM 332 controller can read the configuration data to establish previously described modes, sizes, etc. for MVRAM 142 from decoding table 333, or alternatively, can be based on the results of
29/48 decoding passed from IMC 331 and 1/0 subsystem 315. For example, at any time of manufacture or in the field, computer system 300 can program decoding table 333 to mark different regions of NVRAM 142 as system memory, mass storage media exposed via SATA interfaces, mass storage media exposed via USB-only Bulk Only Transport (BOT) interfaces, encrypted storage media that supports TPM storage, among others. The means by which access is directed to different portions of the NVRAM 142 device is through decoding logic. For example, in one mode, the address range of each partition is defined in decoding table 333. In one mode, when the IMC 331 receives an access request, the target address of the request is decoded from storage devices) to reveal whether the request is directed to memory, NVRAM, or I / O mass storage media. If it is a memory request, IMC 331 and / or the MSC 124 controller additionally determines from the target address whether the request is addressed to NM as a cache for FM 150B or FM 151B. For FM 151B access, the request is sent to the NVRAM 332 controller. IMC 331 passes the request to the I / O subsystem 115 if that request is addressed to I / O (for example, storage I / O and not O I / O subsystem 115 further decodes the address to determine whether the address points for the NVRAM mass storage media
30/48
152Α or NVRAM 172 BIOS, other storage or non-storage I / O devices. If this address points to the NVRAM 152A or BIOS NVRAM 172 mass storage media, the I / O subsystem 115 sends a request to the NVRAM 332 controller. If that address points to TMP / NVRAM 173, the I / O subsystem 115 passes the request for TPM 334 to perform secure access.
[0055] The presence of a new memory architecture as described here provides an abundance of new possibilities. Although discussed in greater detail below, some of these possibilities are quickly highlighted immediately below.
[0056] According to a possible implementation, NVRAM 142 acts as a total replacement or supplement for traditional DRAM technology in the system memory. In one embodiment, NVRAM 142 represents the introduction of a second level system memory (for example, the system memory can be seen as having a first level system memory comprising close memory as a 152B cache (part of the DRAM 340 device) and a second level system memory comprising 151B distant (FM) memory (part of NVRAM 142).
[0057] According to some modalities, NVRAM 142 acts as a total replacement or supplement for the flash / magnetic / optical mass storage media 152b. As previously described, in some embodiments, although the NVRAM 152A is capable of byte-level addressability, the NVRAM 332 controller can still access mass storage media
31/48
NVRAM 152A in multi-byte blocks, depending on the implementation (for example, 64 Kbytes, 128 Kbytes, etc.). The specific way in which data is accessed from the NVRAM 152A mass storage media by the NVRAM 332 controller can be transparent to software running on the 310 processor. For example, although the NVRAM 152A mass storage media can be accessed differently from from the 152A flash / magnetic / optical mass storage media, the operating system can still view the NVRAM 152A mass storage media as a standard mass storage device (for example, a serial ATA hard drive or otherwise) standard mass storage media device).
[0058] In a mode where NVRAM 152a mass storage media acts as a total replacement for 152B flash / magnetic / optical mass storage media, it is not necessary to use storage media triggers to access addressable storage media per block. Removing the storage media trigger overhead from accessing the storage media can increase access speed and save energy. In alternative modalities where NVRAM 152A mass storage media is desired to appear for OS and / or applications as block accessible and not distinguishable from 152B flash / magnetic / optical mass storage media, storage media triggers emulated can be used to expose block accessible interfaces (for example, transport
32/48 only in mass (BOT) of universal serial bus (USB), 1.0; attachment of advanced serial technology (SATA), 3.0; and similar) to the software for accessing the NVRAM 152A mass storage media.
[0059] In one embodiment, NVRAM 142 acts as a total replacement or supplement for firmware memory such as BIOS flash 362 and TPM flash 372 (illustrated with dotted lines in Figure 3 to indicate that they are optional). For example, NVRAM 142 may include a BIOS NVRAM 172 portion to supplement or replace flash BIOS 362 and may include a TPM NVRAM 17 3 portion to supplement or replace flash TPM 372. Firmware memory can also store used persistent system states by a TPM 3 34 to protect sensitive system information (eg encryption keys). In one embodiment, the use of NVRAM 142 for firmware memory removes the need for third-party flash parts to store codes and data that are crucial for system operations.
[0060] Continuing then with a discussion of the system of Figure 3, in some embodiments, computer system architecture 100 may include multiple processors, although a single processor 310 is illustrated in Figure 3 for simplicity. The 310 processor can be any type of data processor including a general purpose or special use central processing unit (CPU), an application specific integrated circuit (ASIC) or a digital signal processor (DSP). For example, processor 310 can be
33/48 a general purpose processor, such as a Core ™ i3, i5, i7, 2 Duo and QUad, Xeon ™, or Itanium ™ processor, all of which are available through Intel Corporation of Santa Clara, California, USA. Alternatively, processor 310 may be from another company, such as ARM Holdings, Ltd., from Sunnyvale, CA, USA, MIPS Technologies from Sunnyvale, CA, etc. Processor 310 may be a special-purpose processor, such as, for example, a network or communication processor, compression mechanism, graphics processor, coprocessor, integrated processor, or the like. Processor 310 may be implemented on one or more chips included within one or more packages. Processor 310 may be part of, and / or may be implemented on one or more substrates using any of the process technologies, such as, for example, BiCMOS, CMOS or NMOS. In the embodiment shown in Figure 3, processor 310 has a system configuration on a chip (SOC).
[0061] In one embodiment, processor 310 includes an integrated graphics unit 311 that includes logic for executing graphics commands such as 3D or 2D graphics commands. Although the modalities of the invention are not limited to any specific integrated graphics unit 311, in one embodiment, the graphics unit 311 is capable of executing industry-standard graphics commands such as those specified by Open GL and / or Direct application programming interfaces X (APIs) (for example, OpenGL 4.1 and Direct X 11).
34/48 [0062] Processor 310 may also include one or more cores 101-104, although a single core is illustrated in Figure 3, again, for the sake of clarity. In many embodiments, the core (s) 101-104 includes internal functional blocks such as one or more execution units, withdrawal units, a set of general purpose and specific use registers, etc. If the core (s) is multi-threaded or hyper-threaded, then each hardware thread can also be considered as a logical core. Kernels 101-104 can be homogeneous or heterogeneous in terms of architecture and / or instruction set. For example, some of the cores may be in order while others are out of order. As another example, two or more of the cores may be able to execute the same instruction set, while others may be able to execute only a subset of that instruction set or a different instruction set.
The processor 310 may also include one or more caches, such as cache 313 which can be implemented as an SRAM and / or a DRAM. In many modalities that are not shown, additional caches except cache 313 are implemented so that there are multiple levels of cache between the execution units in the core (s) 101-104 and memory devices, 150B and
151B. For example, the shared cache cluster may include a top level cache, such as a level 1 (Ll) cache, intermediate level caches, such as level 2 (L3), level 3 (L3),
35/48 level 4 (L4), or other cache levels, one (LLC), and / or different combinations thereof. In different embodiments, cache 313 can be divided into proportional parts in different ways, and can be one of many different sizes in different embodiments. For example, the 313 cache can be an 8 megabyte (MB) cache, a 16 MB cache, etc. Additionally, in different modalities, the cache can be a direct mapped cache, a fully associative cache, a multi-way set associative cache, or a cache with another type of mapping. In other embodiments that include multiple cores, the 313 cache may include a large portion shared among all cores or may be divided into several separately functional slices (for example, one slice for each core). The cache 313 can also include a portion shared between all cores and several other portions that are functional slices separated by core. [0064] Processor 310 may also include a native agent 314 that includes those components coordinating and operating cores 101-104. The native agent unit 314 can include, for example, a power control unit (PCU) and a display unit. The PCU can be or include logic and the components necessary to regulate the power state of the core (s) 101-104 and the integrated graphics unit 311. The display unit is for driving one or more externally connected screens. [0065] As mentioned, in some embodiments, processor 310 includes an integrated memory controller (IMC) 331, memory cache controller
36/48 next (MAC), and NVRAM 332 controller all of which can be on the same chip as processor 310, or on a separate chip and / or package connected to processor 310. DRAM 144 device can be on the same chip or in a different chip than the IMC 331 and MSC 124 controller; thus, a chip can have processor 310 and device DRAM 144; one chip may have a 310 processor and another a DRAM 144 device (these chips may be in the same or different packages); one chip can have a 101-104 core (s) and another the IMC 331, the MSC 124 and DRAM 144 controller (these chips can be in the same or different packages); one chip can have a 101-104 core (s), another the IMC 331 and MSC 124 controller, and another the DRAM 144 (these chips can be in the same or different packages); etc.
[0066] In some embodiments, processor 310 includes an I / O subsystem 115 coupled to the IMC 331. The 1/0 115 subsystem enables communication between processor 310 and the following serial or parallel I / O devices: one or more 33 6 networks (such as a Local Area Network, Remote Area Network or the Internet), storage I / O device (such as flash / magnetic / optical mass storage media 152B, flash BIOS 362, TPM flash 372) and one or more non-storage I / O devices 337 (such as screen, keyboard, speaker, and the like). The I / O subsystem 115 may include a platform controller hub (PCH) (not shown) that also includes several I / O adapters 33 8 and other I / O circuits to provide access to I / O devices from storage and
37/48 non-storage and networks. To accomplish this, the 1/0 115 subsystem can have at least one integrated 1/0 adapter 338 for each 1/0 protocol used. The 1/0 115 subsystem can be on the same chip as processor 310, or on a separate chip and / or separate package connected to processor 310.
[0067] I / O adapters 338 convert a host communication protocol used inside processor 310 into a protocol compatible with specific I / O devices. For storage media from. mass flash / magnetic / optical 152b, some of the protocols that 338 I / O adapters can convert include Peripheral Component Interconnect (PCI) -Express (PCI-E), 3.0; USB, 3.0; SATA, 3.0; Small Computer System Interface (SCSI), Ultra-640; and Institute of Electrical and Electronics Engineers (IEEE) 1394 Firewire; among others. For flash BIOS 362, some of the protocols that I / O adapters 338 can convert include Serial Peripheral Interface (SPI), Microwire, among others. Additionally, there may be one or more wireless protocol I / O adapters. Examples of wireless protocols, among others, are used in personal area networks, such as IEEE 802.15 and Bluetooth, 4.0; wireless local area networks, such as IEEE 802.11-based wireless protocols; and cell protocols.
[0068] In some modalities, the I / O subsystem 115 is coupled with a TPM 334 control to control access to persistent system states, such as secure data, encryption keys, information
38/48 platform configuration and the like. In one embodiment, these persistent system states are stored in a TMP NVRAM 173 and accessed through an NVRAM 332 controller.
[0069] In one embodiment, the TPM 334 is a secure microcontroller with cryptographic functionality. The TPM 334 has a number of capabilities related to reliability; for example, a SEAL capability to ensure that data protected by a TPM is available only for the same TPM. The TPM 334 can protect data and keys (for example, secrets) using its encryption capabilities. In one mode, TPM 334 has a secret and unique RSA key, which allows it to authenticate hardware devices and platforms. For example, TPM 334 can verify that a system looking for access to data stored on computer system 300 is the expected system. TPM 334 is also able to report the integrity of the platform (for example, computer system 300). This allows an external source (for example, a server on a network) to determine the health of the platform, but does not prevent user access to the platform.
[0070] In some modalities, the I / O subsystem 315 also includes a management mechanism (ME) 335, which is a microprocessor that allows a system administrator to monitor, maintain, update, improve and repair the system computer 300. In one embodiment, a system administrator can remotely configure computer system 300
39/48 by editing the contents of the decoding table 333 through ME 335 through the 336 networks.
[0071] For convenience of explanation, the application may sometimes refer to NVRAM 142 as a PCMS device. A PCMS device includes multi-layer (vertically stacked) PCM cell arrangements that are non-volatile, have low power consumption, and are modifiable at the bit level. As such, the terms, NVRAM device and PCMS device can be used interchangeably in the following discussion. However, it should be noted, as discussed above, that different technologies in addition to PCMS can also be used for NVRAM 142.
[0072] It should be understood that a computer system can use NVRAM 142 for system memory, mass storage, firmware memory and / or other memory and storage purposes even if the processor of that computer system does not have all the components described above processor 310, or has more components than processor 310.
[0073] In the specific modality shown in Figure 3, the MSC 124 controller and the NVRAM 332 controller are located in the same chip or package (referred to as the CPU package) as the 310 processor. In other embodiments, the MSC 124 controller and / or NVRAM 332 controller can be located outside the chip or outside the CPU pack, coupled to the 310 processor or CPU pack through a bus such as a memory bus (such as a DDR bus (e.g., a DDR3, DDR4, etc.). )), a PCI express bus, a
40/48 desktop management interface (DMI) bus, or any other type of bus.
DISTANT SIDE MEMORY USED FOR MULTIPLE THREADS TRANSACTIONAL SOFTWARE AND NON-TRANSACTIONAL SOFTWARE [0074] Processor designers are currently designing enhanced instruction sets that enable transactional support for multiple thread software. In multi-threaded (ie, non-transactional) software, programs protect data with locks. Only one thread can maintain a lock at any time, so you can ensure that no other threads are modifying data at the same time. This tends to be pessimistic: the thread with the lock prevents any threads from taking the lock, even if they just want to read the data, or do a non-conflicting update on it.
[0075] With transactional support, with reference to Figure 4, threads no longer need to remove locks when manipulating data. They initiate a 401 transaction, make their 402 changes, and when they are finished, submit the transaction 403 or revert 404 to the changes made in step 402 if the transaction cannot be submitted. While the thread is making its changes 402 during the course of the transaction, with reference to Figure 5, special hardware 570 within processor 510 notes any / all cache 513 and nearby memory locations 550B from which the thread reads or from which thread writes.
41/48 [0076] Typically, any / all data recordings made by a transaction are present in cache, simply because a cache holds the most recent system changes. That is, if a transaction needs to change a data item, the data item is called from the deepest storage media if it is not already cached, changed, and then cached. Thus, assuming that the amount of data changes made by a transaction is limited to being less than the cache size available for each data address, all changes made by a transaction will be present in the cache. Hardware inside the processor prevents writeback of these changed data items for persistence until the transaction is committed. In a first embodiment, the cache referred to above includes processor caches and close memory. In a second embodiment, the cache referred to above includes only processor caches (that is, nearby memory is not included). For simplicity, the rest of the document will refer mainly to the first modality.
[0077] In one embodiment, there is a special hardware instance 57 0 for each 501-504 CPU core within the 510 processor and / or each instruction execution pipeline within each CPU core within the 510 processor. special hardware instance 570 (for example, as implemented with logic circuits) of the core / pipeline that is running the transactional thread and considers readings and writes from nearby memory and transaction cache as described
42/48 above. Note that some levels of caching within the 510 processor can serve multiple cores (for example, a top-level cache) while other levels of caching within the 510 processor can serve only a single core (for example, a cache Core ll).
[0078] When the transaction is ready to be submitted, the special hardware 570 verifies that while the transaction was executing, no other thread made any changes in, or reading from, those same locations. If this condition is met, the transaction is submitted 403 and the thread continues. Here, the perpetration of changes means that the changes are written to persistence storage media. If this condition is not met, the transaction is aborted, and all of its changes are undone 404. In one mode, to undo the changes, new data representing the state of the data before any changes made by the transaction are called from the media persistence and rewritten caching, or cache lines that have been changed are invalidated. The thread can then try the operation again, try a different strategy (for example, one that uses blocks), or give up completely.
[0079] In an implementation, NVRAM 551B distant memory corresponds to persistence storage to which the submitted data changes are stored from the perpetration of a transition, while the near memory 550B and any / all 513 caches
43/48 of records processing a medium whose memory above corresponds to the cache locations where a thread can make changes before the perpetration of its transaction.
[0080] The concept of persistence storage in several cases, however, can be extended to other types of software processes that do not technically satisfy the definition of a transition as discussed above. Persistence storage, according to several different information paradigms, can be storage of recordable data that reflects the formally recognized state of some process or data structure (and therefore is globally visible, for example), and / or, has some expectation that it will be needed for an extended period of time (for example, multiple computer system on / off cycles). Notably, many such software processes may also choose to implement persistent storage in distant memory NVRAM 551B.
[0081] For those non-transactional software processes that acknowledge the existence of persistence storage, the software must have to ensure that the integrated precautions that modified data that need to be downloaded from the cache and persisted to be persisted are stored on prior media before being any subsequent changes made to them. Here, for example, if a change is made to a data item and the software views the change as needing to be
44/48 reflected in persistent storage, the software will insert an instruction to unload cache line (for example, CLFLUSH) followed by a memory obstacle instruction (for example, MFENCE). The instruction to flush cache line will cause recently changed data to be written back to the 551B persistence storage media. The memory obstacle instruction will prevent other operations on the same thread from accessing the data until it has been written to the 551B persistent storage media.
[0082] In more complicated approaches, the thread software includes complicated book keeping tasks to monitor which cached data items need to be persisted to the 551B persistence storage media. Here, for example, certain data items may be recognized by the thread software as requiring persistence, the bookkeeping software will monitor those data items and, at an appropriate time in the code's execution, execute the appropriate cache line and instructions for memory obstacle.
[0083] Figure 6 shows an improved approach where the special 570 hardware of Figure 5 is also used, not only to support the rollback of the transactions as discussed above with respect to Figure 4, but also to eliminate the need for software bookkeeping function described above.
[0084] As noted in Figure 6, the software is only asked to define a persistent code region. This definition is marked at the beginning of the region
45/48 with a PBEGIN instruction 601 and at the end of the region with a PEND instruction 604. The PBEGIN instruction essentially activates 602 the functionality of special hardware 570. Although the code is running after the PBEGIN instruction, special hardware 570 monitors 603 which cache lines that have changed. When the PEND instruction 604 is executed it causes the cache lines identified by the special hardware 570 to be downloaded 605 for persistence 551B and disables the special hardware 570. No other instructions are allowed to be executed after the PEND instruction until all the lines cache to be downloaded to perform the memory obstacle.
[0085] Thus the special hardware 570 monitors cache accesses not only during transactional operations, but also during non-transactional operations. Figure 5 shows a representation of a 580 instruction execution pipeline within a core that is coupled to special 570 hardware. Here coupling is used to activate special 570 hardware in response to a PBEGIN instruction and disable special hardware in response to a PEND instruction. The instruction execution pipeline is also designed with logic to prevent the next instruction from being issued until the cache unload is complete. The cache unloading logic is also coupled to the instruction execution pipeline and special hardware, but it is not drawn for convenience. The cache flush logic is activated by the PEND instruction and queries the special 570 hardware to understand
46/48 which cache lines need to be flushed. Other features of Figure 5 are described above with respect to Figure 3.
[0086] Figure 7 shows an integrated methodology that illustrates the two functions of specialized hardware. Unless a transactional operation begins or the PBEGIN instruction is executed, specialized hardware 551B remains inactive 701.
[0087] If a transaction operation starts, specialized hardware 570 is enabled and begins to monitor which cache lines are modified by transaction 702. When the transaction is complete, transactional hardware 571 inside the processor checks whether any other transactions have been recorded nas, or read from those same cache lines 703. If none of them have been written or read, the changes are committed 704 to the distant memory NVRAM 551B, otherwise the cache lines are replaced with content from the persistence NVRAM 551B or invalidated 705.
[0088] If a PBEGIN instruction is executed, specialized hardware 570 is enabled and starts monitoring which cache lines are modified by the software process 706. When the PEND instruction is executed, all modified cache data is written back to the Persistence NVRAM 551B and no other instructions are allowed to execute until the writeback is completed 707.
[0089] Figure 8 shows a compilation process to be performed by a compiler. As noted in
47/48
In Figure 8, the compilation process 801 identifies the beginning of a code region from which any data changes made by the code must be persisted for persistence storage. In response to identification 801, the build code inserts 802 a PBEGIN instruction in the program code or marks the location in the code where the PBEGIN instruction is to be inserted. The compilation process also identifies 803 the beginning of a code region where after the PBEGIN instruction is (or should be) inserted whose data changes do not need to be persisted. In response to identification 803 of the second code region, the compilation process inserts a PEND instruction (or tag where a PEND instruction is to be inserted) into the program code after the last data change that needs to be persisted, but before the first data change that need not be persisted.
[0090] The processes taught by the discussion above can be performed with program code such as instructions executable by machine that make a machine (such as a virtual machine, commonly used CPU processor or processing core arranged on a chip) semiconductor or special-purpose processor arranged on a semiconductor chip) performs certain functions. Alternatively, these functions can be performed by specific hardware components that contain physical connection logic to perform the functions, or by any combination of components
48/48 programmed computer and custom hardware components.
[0091] A storage medium can be used to store program code.
A storage medium that stores program code can be incorporated as, but is not limited to, one or more memories (for example, one or more flash memories, random access memories (static, dynamic or otherwise)), optical discs, CD-ROMs, DVD ROMs, EPROMs, EEPROMs, magnetic or optical cards or other machine-readable media suitable for storing electronic instructions. Program code can also be downloaded from a remote computer (for example, a server) to a requesting computer (for example, a client) via data signals embedded in a propagation medium (for example, via a communication link (for example, a network connection).
[0092] In the previous Descriptive Report, the invention was described with reference to its specific exemplary modalities. However, it will be evident that several modifications and alterations can be made to them without departing from the broader spirit and scope of the invention as presented in the appended claims. The specification and drawings, therefore, must be considered in an illustrative sense rather than in a restrictive sense.
1/5

权利要求:
Claims (5)
[1]
1. Processor, characterized by comprising:
an interface for non-volatile random access memory;
set of logic circuits for:
identify cache lines modified by a transaction that views non-volatile random access memory as the transaction persistence store; and identify cache lines modified by a software process except a
transaction which also displays the memory access random non-volatile as weapon z enamen t o in persistence. 2. Processor, in wake up with The claim 1, characterized fur fact that The
non-volatile random access memory is PCM.
3. Processor according to claim 2, characterized by the fact that the non-volatile random access memory is PCMS.
4. Processor according to claim 1, characterized by the fact that it also comprises an instruction execution pipeline to execute a first instruction of the software process that identifies a starting point of the software process code and executes a second instruction of the software process. software process that identifies an end point of the software process code, where data changes made by the software process between the point
[2]
2/5 start and end point must be persisted to volatile random access memory, the instruction execution pipeline coupled to the logic circuitry to enable the logic circuitry in response to the first instruction and disable the
set of logic circuits in response the second instruction. 5. Processor, according with the claim 4, characterized by the fact that the
The processor comprises a cache unloading circuitry coupled to the logic circuitry instruction execution pipeline.
6. Processor according to claim 5, characterized by the fact that the instruction execution pipeline will not execute a next instruction until the cache unload circuits have completed the unloading of the cache lines identified by the logic circuits to memory non-volatile random access in response to the second instruction.
7. Method, characterized by understanding: executing a transaction on a processor where the execution of the transaction includes making changes to data items, changes kept in one or more levels of cache storage, monitoring where the changes reside in one or more levels cache with processor hardware, determine whether changes should be submitted to non-volatile random access memory, and
[3]
3/5 non-volatile random access from the perpetration or reversal of changes;
run a software process on the processor that marks a start and end of a region of the software process where data changes made by the software process must be persisted, monitor changes made by the software process between start and end with the hardware, and save the changes made by the software process to the non-volatile random access memory from the end of the region being reached.
8. Method according to claim 7, characterized by the fact that the beginning is marked with a first instruction and the end is marked with a second instruction.
9. Method, according to claim 7, characterized by the fact that non-volatile random access memory is implemented with PCM.
10. Method, according to claim 9, characterized by the fact that non-volatile random access memory is implemented with PCMS.
11. Method, according to claim 7, characterized by the fact that the method also comprises enabling the hardware in response to the first instruction.
12. Method, according to claim 11, characterized by the fact that the method further comprises disabling the hardware in response to the second instruction.
[4]
4/5
13. Machine-readable storage media having program code stored in the same code when processed by a computer system causes a method to be performed, the method characterized by comprising:
identify a region of code whose data changes need to be persisted for persistence storage; and, mark the beginning of an end of the region with first and second instructions respectively.
14. Machine-readable media, according to claim 13, characterized by the fact that the first instruction causes a processor to enable hardware that monitors data changes.
15. Machine-readable media, according to claim 14, characterized by the fact that the second instruction disables the hardware.
16. Machine-readable media, according to claim 13, characterized by the fact that the second instruction causes data changes to be downloaded from one or more cache levels for persistence storage.
17. Machine-readable media according to claim 16, characterized by the fact that the second instruction prevents any further instructions from being executed until the data changes have finished downloading to the persistence store.
18. Machine-readable media, according to claim 13, characterized by the fact that the
[5]
5/5 persistence storage is non-volatile random access memory.
19. Machine-readable media according to claim 18, characterized by the fact that the non-volatile random access memory is PCM.
20. Machine-readable media according to claim 19, characterized by the fact that the non-volatile random access memory is PCMS.
INTERNAL PROCESSOR CACHES {BY
EXAMPLE, SRAM
120
PROCESSOR 100
CORE 101 NÚCLE0102 CORE 103 CORE 104 L0 101a L0 102a L0 103a L0 104a L1 101b L1 102b L1 103b L1 104b
DISTANT MEMORY (EXAMPLE, PCM) AND, OPTIONALLY, HIGH SPEED MEMORY, (EXAMPLE, DRAM) - | CONFIGURED AS I SYSTEM MEMORY j 122
NEXT MEMORY
ACTING AS DISTANT MEMORY CACHE (EXAMPLE, DRAM)
121
LLC 105
SYSTEM MEMORY (ADDRESS RANGE No. 1)
111
SYSTEM MEMORY (ADDRESS RANGE No. 2)
117
SYSTEM MEMORY (ADDRESS RANGE No. 3)
118
SYSTEM MEMORY (ADDRESS RANGE No. 4)
119
1/8

类似技术:

公开号 | 公开日 | 专利标题

US9817758B2|2017-11-14|Instructions to mark beginning and end of non transactional code region requiring write back to persistent storage

US10719443B2|2020-07-21|Apparatus and method for implementing a multi-level memory hierarchy

US11200176B2|2021-12-14|Dynamic partial power down of memory-side cache in a 2-level memory hierarchy

US10102126B2|2018-10-16|Apparatus and method for implementing a multi-level memory hierarchy having different operating modes

GB2514023B|2019-07-03|System and method for intelligently flushing data from a processor into a memory subsystem

US9317429B2|2016-04-19|Apparatus and method for implementing a multi-level memory hierarchy over common memory channels

US20140229659A1|2014-08-14|Thin translation for system access of non volatile semicondcutor storage as random access memory

US10180796B2|2019-01-15|Memory system

US20170109074A1|2017-04-20|Memory system

同族专利:

公开号 | 公开日

US9817758B2|2017-11-14|

GB201404562D0|2014-04-30|

JP2014182836A|2014-09-29|

US20170123980A1|2017-05-04|

KR20160130199A|2016-11-10|

GB2515146B|2016-02-17|

CN107193756B|2020-12-01|

DE102014003668A1|2014-09-18|

JP6371431B2|2018-08-08|

US20140281240A1|2014-09-18|

KR20140113605A|2014-09-24|

CN107193756A|2017-09-22|

GB2515146A|2014-12-17|

JP2017130229A|2017-07-27|

JP6121010B2|2017-04-26|

KR101779723B1|2017-09-18|

CN104050112B|2017-06-20|

JP2016129041A|2016-07-14|

US9547594B2|2017-01-17|

KR101673280B1|2016-11-07|

CN104050112A|2014-09-17|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

US5428761A|1992-03-12|1995-06-27|Digital Equipment Corporation|System for achieving atomic non-sequential multi-word operations in shared memory|

DE69330768T2|1992-04-24|2002-07-04|Compaq Computer Corp|Method and device for operating a multiprocessor computer system with cache memories|

US7269693B2|2003-02-13|2007-09-11|Sun Microsystems, Inc.|Selectively monitoring stores to support transactional program execution|

US7882339B2|2005-06-23|2011-02-01|Intel Corporation|Primitives to enhance thread-level speculation|

US7937534B2|2005-12-30|2011-05-03|Rajesh Sankaran Madukkarumukumana|Performing direct cache access transactions based on a memory access data structure|

US20070198979A1|2006-02-22|2007-08-23|David Dice|Methods and apparatus to implement parallel transactions|

US8180977B2|2006-03-30|2012-05-15|Intel Corporation|Transactional memory in out-of-order processors|

US20080005504A1|2006-06-30|2008-01-03|Jesse Barnes|Global overflow method for virtualized transactional memory|

JP4767361B2|2008-03-31|2011-09-07|パナソニック株式会社|Cache memory device, cache memory system, processor system|

US8219741B2|2008-10-24|2012-07-10|Microsoft Corporation|Hardware and operating system support for persistent memory on a memory bus|

US8627017B2|2008-12-30|2014-01-07|Intel Corporation|Read and write monitoring attributes in transactional memory systems|

US8489864B2|2009-06-26|2013-07-16|Microsoft Corporation|Performing escape actions in transactions|

GB2484416B|2009-06-26|2015-02-25|Intel Corp|Optimizations for an unbounded transactional memory system|

US8316194B2|2009-12-15|2012-11-20|Intel Corporation|Mechanisms to accelerate transactions using buffered stores|

KR101639672B1|2010-01-05|2016-07-15|삼성전자주식회사|Unbounded transactional memory system and method for operating thereof|

JP2011154547A|2010-01-27|2011-08-11|Toshiba Corp|Memory management device and memory management method|

CN101777154A|2010-02-01|2010-07-14|浪潮集团山东通用软件有限公司|Persistence method of workflow data in workflow management system|

US20110208921A1|2010-02-19|2011-08-25|Pohlack Martin T|Inverted default semantics for in-speculative-region memory accesses|

US8402227B2|2010-03-31|2013-03-19|Oracle International Corporation|System and method for committing results of a software transaction using a hardware transaction|

US20120079245A1|2010-09-25|2012-03-29|Cheng Wang|Dynamic optimization for conditional commit|

US8352688B2|2010-11-15|2013-01-08|Advanced Micro Devices, Inc.|Preventing unintended loss of transactional data in hardware transactional memory systems|

US8788794B2|2010-12-07|2014-07-22|Advanced Micro Devices, Inc.|Programmable atomic memory using stored atomic procedures|

US20130013899A1|2011-07-06|2013-01-10|International Business Machines Corporation|Using Hardware Transaction Primitives for Implementing Non-Transactional Escape Actions Inside Transactions|

EP3451176A1|2011-09-30|2019-03-06|Intel Corporation|Apparatus and method for implementing a multi-level memory hierarchy having different operating modes|

US20130091331A1|2011-10-11|2013-04-11|Iulian Moraru|Methods, apparatus, and articles of manufacture to manage memory|

CN103999057B|2011-12-30|2016-10-26|英特尔公司|There is metadata management and the support of the phase transition storage of switch|

US9471622B2|2012-04-30|2016-10-18|International Business Machines Corporation|SCM-conscious transactional key-value store|

US20140040588A1|2012-08-01|2014-02-06|International Business Machines Corporation|Non-transactional page in memory|US9946540B2|2011-12-23|2018-04-17|Intel Corporation|Apparatus and method of improved permute instructions with multiple granularities|

JP6166616B2|2013-08-07|2017-07-19|株式会社東芝|Information processing method, information processing apparatus, and program|

US20150095578A1|2013-09-27|2015-04-02|Kshitij Doshi|Instructions and logic to provide memory fence and store functionality|

US10019354B2|2013-12-09|2018-07-10|Intel Corporation|Apparatus and method for fast cache flushing including determining whether data is to be stored in nonvolatile memory|

GB2529148B|2014-08-04|2020-05-27|Advanced Risc Mach Ltd|Write operations to non-volatile memory|

US10489158B2|2014-09-26|2019-11-26|Intel Corporation|Processors, methods, systems, and instructions to selectively fence only persistent storage of given data relative to subsequent stores|

US10387259B2|2015-06-26|2019-08-20|Intel Corporation|Instant restart in non volatile system memory computing systems with embedded programmable data checking|

US10078448B2|2015-07-08|2018-09-18|Samsung Electronics Co., Ltd.|Electronic devices and memory management methods thereof|

US9792224B2|2015-10-23|2017-10-17|Intel Corporation|Reducing latency by persisting data relationships in relation to corresponding data in persistent memory|

US9824419B2|2015-11-20|2017-11-21|International Business Machines Corporation|Automatically enabling a read-only cache in a language in which two arrays in two different variables may alias each other|

JP6740607B2|2015-12-18|2020-08-19|富士通株式会社|Simulation program, information processing device, simulation method|

US10318295B2|2015-12-22|2019-06-11|Intel Corporation|Transaction end plus commit to persistence instructions, processors, methods, and systems|

CN105912486B|2016-04-27|2019-03-29|联想有限公司|Information processing method and processor|

KR20170122466A|2016-04-27|2017-11-06|에스케이하이닉스 주식회사|Memory system and operating method of memory system|

JP2018049385A|2016-09-20|2018-03-29|東芝メモリ株式会社|Memory system and processor system|

CN108447903A|2017-02-16|2018-08-24|富士电机株式会社|Semiconductor device|

US10621103B2|2017-12-05|2020-04-14|Arm Limited|Apparatus and method for handling write operations|

US10481958B2|2017-12-29|2019-11-19|Intel IP Corporation|Speculative execution tag for asynchronous DRAM refresh|

US11099995B2|2018-03-28|2021-08-24|Intel Corporation|Techniques for prefetching data to a first level of memory of a hierarchical arrangement of memory|

法律状态:
2018-02-14| B03A| Publication of a patent application or of a certificate of addition of invention [chapter 3.1 patent gazette]|

2020-03-03| B06U| Preliminary requirement: requests with searches performed by other patent offices: procedure suspended [chapter 6.21 patent gazette]|

2022-01-25| B09A| Decision: intention to grant [chapter 9.1 patent gazette]|

优先权:

申请号 | 申请日 | 专利标题

US13/843,760|US9547594B2|2013-03-15|2013-03-15|Instructions to mark beginning and end of non transactional code region requiring write back to persistent storage|

US13/843,760|2013-03-15|

[返回顶部]